TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain

نویسندگان

  • Anoop Kunchukuttan
  • Rajen Chatterjee
  • Shourya Roy
  • Abhijit Mishra
  • Pushpak Bhattacharyya
چکیده

Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation is decomposed into the following smaller tasks: (a) translation of constituent phrases of the sentence; (b) validation of quality of the phrase translations; and (c) composition of complete sentence translations from phrase translations. TransDoop incorporates quality control mechanisms and easy-to-use worker user interfaces designed to address issues with translation crowdsourcing. We have evaluated the crowd’s output using the METEOR metric. For a complex domain like judicial proceedings, the higher scores obtained by the map-reduce based approach compared to complete sentence translation establishes the efficacy of our work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domains

Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation i...

متن کامل

IRT-based Aggregation Model of Crowdsourced Pairwise Comparison for Evaluating Machine Translations

Recent work on machine translation has used crowdsourcing to reduce costs of manual evaluations. However, crowdsourced judgments are often biased and inaccurate. In this paper, we present a statistical model that aggregates many manual pairwise comparisons to robustly measure a machine translation system’s performance. Our method applies graded response model from item response theory (IRT), wh...

متن کامل

Image encryption based on chaotic tent map in time and frequency domains

The present paper is aimed at introducing a new algorithm for image encryption using chaotic tent maps and the desired key image. This algorithm consists of two parts, the first of which works in the frequency domain and the second, in the time domain. In the frequency domain, a desired key image is used, and a random number is generated, using the chaotic tent map, in order to change the phase...

متن کامل

Selected Crowdsourced Translation Practices

This paper contains research related to workflow and design patterns. It briefly discusses the suitability of industry tools for crowdsourcing processes in terms of workflow pattern support. After listing a number of practices identified by analysing crowdsourced translation workflow models, the paper discusses four of the practices and presents two recommendations based on the scenarios of rea...

متن کامل

On Analytical Study of Self-Affine Maps

Self-affine maps were successfully used for edge detection, image segmentation, and contour extraction. They belong to the general category of patch-based methods. Particularly, each self-affine map is defined by one pair of patches in the image domain. By minimizing the difference between these patches, the optimal translation vector of the self-affine map is obtained. Almost all image process...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013